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- - . Abstract 

a^ 

' In this paper, we deal with the problem of calibrating thresholding rules in the setting of 

I Poisson intensity estimation. By using sharp concentration inequalities, oracle inequalities are 

derived and we establish the optimality of our estimate up to a logarithmic term. This result is 
proved under mild assumptions and we do not impose any condition on the support of the signal 
to be estimated. Our procedure is based on data-driven thresholds. As usual, they depend on 
a threshold parameter 7 whose optimal value is hard to estimate from the data. Our main 
' concern is to provide some theoretical and numerical results to handle this issue. In particular, 

we establish the existence of a minimal threshold parameter from the theoretical point of view: 
^ ' taking 7 < 1 deteriorates oracle performances of our procedure. In the same spirit, we establish 

\ the existence of a maximal threshold parameter and our theoretical results point out the optimal 

range 7 £ [1,12]. Then, we lead a numerical study that shows that choosing 7 larger than 1 
but close to 1 is a fairly good choice. Finally, we compare our procedure with classical ones 



Oh: 

< 



^ ' revealing the harmful role of the support of functions when estimated by classical procedures. 



> 
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. 1 Introduction 
O 

' In this paper, we consider the problem of estimating the intensity of a Poisson process. From a 

O ■ practical point of view, various methodologies have already been proposed. See for instance Rudemo 



X 
J3 



2j] who proposed kernel and data-driven histogram rules calibrated by cross-validation. Thresh- 



olding algorithms have been performed by Donoho who modified the universal thresholding 



procedure proposed in [l3|] by using the Anscombe transform or by Kolaczyk [20(] whose procedure 
is based on the tails of the distribution of the noisy wavelet coefficients of the intensity. Finally, 
let us cite penalized model selection type estimators built by Willett and Nowak [2^ based on 
models spanned by piecewise polynomials. From the theoretical point of view. Cavalier and Koo 
[lo| derived minimax rates on Besov balls by using wavelet thresholding. In the oracle approach. 



various optimal adaptive model selection rules have also been built by Baraud and Birge [a], Birge 



and Reynaud-Bouret [2j|. Let us mention that these procedures are also minimax provided the 
intensity to be estimated is assumed to be supported by [0, 1] . 

In a previous paper, we refined classical wavelet thresholding algorithms by proposing local data- 



driven thresholds (see [23|]). Under very mild assumptions, the corresponding procedure achieves 
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optimal oracle inequalities and optimal minimax rates up to a logarithmic term. In particular, these 
results are true even if the support of the intensity is unknown or infinite, which is rarely considered 
in the literature. In [23|, we give many arguments to justify this unusual setting and we illustrate the 
influence of the support on minimax rates by showing how these rates deteriorate when the sparsity 
of the intensity decreases. So, this algorithm, that is easily implementable, automatically adapts 
to the unknown regularity of the signal as usual, but also to the unknown support which is not 
classical. The main goal of this paper is to study the optimal calibration of the procedure studied 
in from both theoretical and practical points of view. For this purpose, the next subsection 
briefly describes this procedure (Section [2] gives accurate definitions) and Section [L2] presents the 
calibration issue. 



1.1 A brief description of our procedure 

We observe a Poisson process whose mean measure /i is finite on the real line M and is absolutely 
continuous with respect to the Lebesgue measure (see Section [7?T] where we recall classical facts on 
Poisson processes). Given n a positive integer, we define the intensity of N as the function / that 
satisfies 

f(x) = — 
ndx 

So, the total number of points of the process denoted card(A^), satisfies 

E[card(iV)] = n||/||i < oo. 

In particular, card(A^) is finite almost surely. In the sequel, / will be held fixed and n will go to 
+00. The introduction of n could seem artificial, but it allows to present the following asymptotic 
theoretical results in a meaningful way since the mean of the number of points of goes to oo when 
n — > oo. In addition, our framework is equivalent to the observation of a n-sample of a Poisson 
process with common intensity / with respect to the Lebesgue measure. The goal of this paper is 
to estimate / by observing the points of A^. 

First, we decompose the signal / to be estimated as follows: 



/ = ^Px'^x with f3x= (pxix)f{x)dx, 
AeA 



where {{ipx)\^\, denotes a biorthogonal wavelet basis. In our paper, we mainly focus 

on the Haar basis (in this case, (fx = ipx for any A) or on a special case of biorthogonal spline 
wavelet bases (in this case, fx is piecewise constant and fx is regular). See Section [72] where we 



recall well-known facts on biorthogonal wavelet bases or Cohen, Daubechies and Feauveau [ll|| for 
a complete overview on such families. As usual in the wavelet setting, our goal is to estimate the 
wavelet coefficients {Px)\ by thresholding empirical wavelet coefficients (/3a)a defined as 



n 



Thresholding procedures have been introduced by Donoho and Johnstone [13||. Their main idea 
is that it is sufficient to keep a small amount of the coefficients to have a good estimation of the 
function /. In our setting, the estimate of / takes the form 



Aer„ 
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where r„ is defined in (|2.6p . The thresholding procedure is detailed and discussed in Section [2l We 
just mention here the form of the data-driven threshold rjx^-y'- 

?7A,7 = Y 27l/A,rilog n H -^^WxWoo, 

where V\^ri is a sharp estimate of \ai{(3\) defined in (12. 5p and where 7 is a constant to be chosen. 
As explained in Section [2l we have for most of the indices A's playing a key role for estimation: 



r?A,7 ~ V 27VA,nlog n. 

In this case, 7/^,7 has a form close to the universal threshold rf^ proposed by Donoho and Johnstone 



13l | in the Gaussian regression framework: 



= V2a2 log 



n. 



where o"^ (assumed to be known in the Gaussian framework) is the variance of each noisy wavelet 
coefficient. Note, however, that our procedure depends on the so-called threshold parameter 7 that 
has to be properly chosen. The next section which describes calibration issues in a general way 
discusses this question. 



1.2 The calibration issue 

The major concern of this paper is the study of the calibration of the threshold parameter 7: how 
should this parameter be chosen to obtain good results in both theory and practice? As usual, it 
can be proved that achieves good theoretical performances in minimax or oracle points of view 
(see [231] or Theorem [T]) provided 7 is large enough. Such an assumption is very classical in the 
literature (see for instance jl], 10 1, 14| or 17]). Unfortunately, most of the time, the theoretical 



choice of the threshold parameter is not suitable for practical issues. More precisely, this choice 



is often too conservative. See for instance Juditsky and Lambert-Lacroix [17|] who illustrate this 
statement in Remark 5 of their paper: their threshold parameter, denoted A, has to be larger than 
14 to obtain theoretical results, but they suggest to use A G [v^, 2] for practical issues. So, one of 
the main goals of this paper is to fill the gap between the optimal parameter choice provided by 
theoretical results on the one hand and by a simulation study on the other hand. 

Only a few papers have been devoted to theoretical calibration of statistical procedures. In the 
model selection setting, the issue of calibration has been addressed by Birge and Massart They 
considered penalized estimators in a Gaussian homoscedastic regression framework with known 
variance and calibration of penalty constants is based on the following methodology. They showed 
that there exists a minimal penalty in the sense that taking smaller penalties leads to inconsistent 
estimation procedures. Under some conditions, they further prove that the optimal penalty is twice 
the minimal penalty. This relationship characterizes the "slope heuristic" of Birge and Massart 0]. 



Such a method has been successfully applied for practical purposes in [2l|]. Baraud, Giraud and 
Huet 0] (respectively Arlot and Massart [2]) generalized these results when the variance is unknown 
(respectively for non-Gaussian or heteroscedastic data). These approaches constitute alternatives 
to popular cross-validation methods (see [l| or [H]). For instance, F-fold cross-validation (see [l5| ) 
is widely used to calibrate procedure parameters but its computational cost can be high. 
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1.3 Our results 

The starting point of our results is the oracle inequality stated in Section [2l Theorem [T] shows that 
the estimate fn,-y achieves the oracle risk up to a logarithmic term. This result is true as soon as 
7 > 1 and / S L2 n Li. In particular, nothing is assumed with respect to the support of / or 
II /II 00' our result remains true if ||/||oo 

= 00 and if the support of / is unknown or infinite. The 
oracle inequality of Theorem [T] is refined in Section [3] where / is assumed to belong to a special 
class denoted J^n{R) whose signals have only a finite number of non-zero wavelet coefficients (see 
Theorem [2|). 

Then, in the perspective of calibrating thresholding rules, we consider theoretical performances 
of /n,7 with 7 < 1 by using the Haar basis. For the signal / = l[o,i], Theorem [T] shows that fn,'y with 
7 > 1 achieves the rate But the lower bound of Theorem [3] shows that the rate of fn,y with 

7 < 1 is larger than for 5 < 1. So, as in ^ for instance, we prove the existence of a minimal 
threshold parameter: 7 = 1. Of course, the next step concerns the existence of a maximal threshold 
parameter. This issue is answered by Theorem |4] which studies the maximal ratio between the risk 
of fn,-y and the oracle risk on We derive a lower bound that shows that taking 7 > 12 leads 

to worse rates constants: this is consequently a bad choice. 

The optimal choice for 7 is derived from a numerical study, keeping in mind that the theory 
points out the range 7 G [1,12]. Some simulations are provided for estimating various signals by 
considering either the Haar basis or a particular biorthogonal spline wavelet basis (see Section [5]) . 
Our numerical results show that choosing 7 larger than 1 but close to 1 is a fairly good choice, which 
corroborates theoretical results. Actually, our simulation study suggests that Theorem [3] remains 
true for all signals of Tn{R) whatever the basis for decomposing signals is used. 

Finally, we lead a comparative study with other competitive procedures. We show that the 
thresholding rule proposed in this paper outperforms universal thresholding (when combined with 
the Anscombe transform) or Kolaczyk's procedure. Finally, the robustness of our procedure with 
respect to the support issue is emphasized and we show the harmful role played by large supports 
of signals when estimation is performed by other classical procedures. 

1.4 Overview of the paper 

Section [2] defines the thresholding estimate 7^,7 and studies its properties under the oracle approach. 
In Section [HI we refine this study on the set of positive functions that can be decomposed on a finite 
combination of the basis. Calibration of thresholds is discussed in Section |4] and Section [5] illustrates 
our theoretical results by some simulations. Section[6]is devoted to the proofs of the results. Finally, 
Section [7] recalls well-known facts on Poisson processes and biorthogonal wavelet bases. 

2 Data-driven thresholding rules and oracle inequalities 

The goal of this section is to specify our thresholding rule. For this purpose, we assume that 
/ belongs to L2(M) and we use the decomposition of / on one of the biorthogonal wavelet bases 
described in Section [7^ We recall that, as classical orthonormal wavelet bases, biorthogonal wavelet 
bases are generated by dilatations and translations of father and mother wavelets. But considering 
biorthogonal wavelets allows to distinguish, if necessary, wavelets for analysis (that are piecewise 
constant functions in this paper) and wavelets for reconstruction with a prescribed number of 
continuous derivatives. Then, the decomposition of / on a biorthogonal wavelet basis takes the 
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following form: 

fcez i>o fcez 

where for any j > and any k £ 1^, 



"fc = / f{x)(l)k{x)dx, Pj^k = / f{x)ilJj,k{x)dx. 
Jm. JR 

See Section [72] for further details. To shorten mathematical expressions, we set 

A = {X = {j,k): j>-l,kGZ} 

and for any X£A,ifx = 4'k (respectively (fx = (j)k) if A = (— and (p\ = il^j^k (respectively 
<^A = V'i.fc) if A = (j, /c) with j > 0. Similarly, /J;^ = Ofc if A = (-1,A:) and f3x = f3j^k if A = {j,k) 
with j > 0. Now, (12. ip can be rewritten as 



f = Y,Px'fx with (3x= [ fx{x)f{x)dx. (2.2) 
aga 



In particular, (|2.2p holds for the Haar basis that will play a special role in this paper, where in this 
case (px = (fx- Now, let us define the thresholding estimate of / by using the properties of Poisson 
processes. First, we introduce for any A G A, the natural estimator of fix defined by 

k = - ! Mx)dN,, (2.3) 
n J 

where we denote by dN the discrete random measure X^j^g^v and for any compactly supported 
function g, 

I g{x)dN, = 9{T). 

So, the estimator (3x is unbiased: E(/3a) = /3a- Then, given some parameter 7 > 0, we define the 
threshold 77^,7 mentioned in Introduction as 

I z ^log n 
r]x,j = y2-fVx,Jogn^ ^;^||9'a||oo, (2.4) 

with 



II 112 II l|2 

Vx,n = Vx,n + \l 27log nVA,n^4^ + 37log (2.5) 



where 



Vx,n = ^ j fl{x)dN^. 
Note that Vx^n satisfies E(yx,n) = ^A,n) where 

Vx,n = Var(/3A) = ^ j ipl{x)f{x)dx. 

Finally, with 

Tn = {A = (j, k)eA: j< jo} , (2.6) 
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where jo = jo{n) is the integer such that 2-'" < n < 2^'^'^^, we set for any A G A, 

= ^^^{|/3Al>r?A,,}l{Aer„} 
and P = (/3a) AG A- Finally, the estimator of / is 

fn,^ = ^Px^x (2.7) 

A6A 

and only depends on the choice of 7. When the Haar basis is used, the estimate is denoted 
and its wavelet coefficients are denoted = (/3f^)AeA- The threshold rjx^j seems to be defined in a 
rather complicated manner but we can notice the following fact. Given A G r„, when there exists a 
constant cq > such that f{x) > cq for x in the support of ip\ satisfying ||(/5a||^ = o„(n(log n)~^), 
then, with large probability, the deterministic term of (|2.4|) is negligible with respect to the random 
one. In this case we asymptotically derive 



??A,7 ~ Y27yx,nlogn, (2.8) 

as stated in Introduction. Actually, the deterministic term of (|2.4|) allows to consider 7 close to 1 
and to control large deviations terms for high resolution levels. In the same spirit, V\^n is slightly 
overestimated and we consider Vx „ instead of V\ n to define the threshold. 



The performance of this procedure has been investigated in the oracle point of view in [23||. We 
recall that in the context of wavelet function estimation by thresholding, the oracle does not tell us 
the true function, but tells us the coefficients that have to be kept. This "estimator" obtained with 
the aid of an oracle is not a true estimator, of course, since it depends on /. But it represents an ideal 
for the particular estimation method. The goal of the oracle approach is to derive true estimators 
which can essentially "mimic" the performance of the "oracle estimator". In our framework, it is 
easy to see that the oracle estimate is / = YliX&Vn /^a'^Aj where I3\ = /JAlj/ja^y^ ^| satisfies 

By keeping the coefficients /3a larger than the thresholds defined in (12. 4|) . our estimator has a risk 
that is not larger than the oracle risk, up to a logarithmic term, as stated by the following key 
result. 

Theorem 1. Let us consider a hiorthogonal wavelet basis satisfying the properties described in 
Section VTM. If ^ > 1, then fn.^y satisfies the following oracle inequality: for n large enough 

nifnn - nl) < Ci^ogn ^ min(/32, Vx,n) + E + ^ (2-9) 

Aer„ A^r„ 

where Ci is a positive constant depending only on 7 and on the functions that generate the biorthog- 
onal wavelet basis. C2 is also a positive constant depending on \\f\\i and on the functions that 
generate the basis. 

Following the oracle point of view of Donoho and Johnstone, Theorem [1] shows that our procedure 
is optimal up to the logarithmic factor. This logarithmic term is in some sense unavoidable. It is 
the price we pay for adaptivity (i.e. for not knowing the coefficients that we must keep). Our result 
is true provided / G Li(M) n L2(M). So, assumptions on / are very mild here. This is not the case 
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for most of the results for non-parametric estimation procedures where one assumes that ||/||oo < oo 
and that / has a compact support. Note in addition that this support and ||/||oo are often known 
in the hterature. On the contrary, in Theorem [T] / and its support can be unbounded. So, we 
make as few assumptions as possible. This is allowed by considering random thresholding with the 
data-driven thresholds defined in (|2.4p . This result is proved in [231] where in addition optimality 
properties of the estimate ()2.7I) under the minimax approach are established. 

A glance at the proof of Theorem [T] shows that the constants Ci and C2 strongly depends on 
7. Actually, without further assumptions on /, the constants Ci and C2 blow up when 7 tends to 
1. In particular, such an oracle inequality is not sharp enough for some calibration issues. In the 
next section, we investigate this problem and we derive sharp oracle inequalities for a large class 
of functions. Furthermore, the upper bound in ()3.2p depends on absolute constants whose size is 
acceptable. 



3 Study on a special class of functions 

In the sequel, we consider the Haar basis and the estimator /^^. We restrict our study on estimation 
of the functions of defined as the set of positive functions that can be decomposed on a finite 
combination of ((^A)AeA- 

^ = I / = J]/3a<^a > : card{A G A : /3a / 0} < 00 I . 
I AeA J 

To study sharp performances of our procedure, we introduce a subclass of the class J^: for any n 
and any radius i?, we define: 

where for any A, we set 

Fx = j f{x)dx and supp(99a) = G M : ipx{x) / 0} , 

JSUpp(v5A) 

which allows to establish a decomposition of !F . Indeed, we have the following result proved in 
Section 16.11 



Proposition 1. When n (or R) increases, {^n{R))nR ^■^ 0, non- decreasing sequence of sets. In 
addition, we have: 

[}\Jrn{R)=r. 

n R 

The definition of !Fn{R) especially relies on the technical condition 

(log n) (log log n) 
^A > 1/3^^0- (3.1) 

Remember that the distribution of the number of points of N that lies in supp((/3a) is the Poisson 
distribution with mean nF\. So, the previous condition ensures that we have a significant number 
of points of N to estimate non-zero wavelet coefficients. Another main point is that under (13. Ij) . 
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(see Section [Q]) . so (12.81) is true with large probability. The term (^°g "■)('°g i°g "■) appears for technical 
reasons but could be replaced by any term Un such that 



lim Un = and lim 



I /logn 



n 



In practice, many interesting signals are well approximated by a function oiJ-. So, using Proposition 
[H a convenient estimate is an estimate with a good behavior on Tn{R), at least for large values 
of n and R. Furthermore, note that we do not have any restriction on the precise location of the 
support of functions of Tn{R) (even if these functions have only a finite set of non-zero wavelet 
coefficients). This provides a second reason for considering !Fn{R) if we are interested in estimated 
signals with unknown or infinite supports. We now focus on with the special value 7 = 1 + \/2 
and we study its properties on !Fn{R)- 



Theorem 2. Let R > be fixed. Let 7 = 1 + and let 77^,7 be as in {2.4^ . Then achieves 
the following oracle inequality: for n large enough, for any f G J^n{R), 



E( 



'n,7 



/||i)<121og?i 



V mm{PlVx.n) + - 
^-^ n 



A6r„ 



(3.2) 



Inequality (|3.2|) shows that on Tn(R)^ our estimate achieves the oracle risk up to the term 
12 logn and the negligible term ^. Finally, let us mention that when / G Tn{R), 



Our result is stated with 7 = 1 + ^/2. This value comes from optimizations of upper bounds given 
by Lemma [T] stated in Section 16.21 This constitutes a first theoretical calibration result and this is 
the first step for choosing the parameter 7 in an optimal way. The next section further investigates 
this problem. 



4 How to choose the parameter 7 

In this Section, our goal is to find lower and upper bounds for the parameter 7. Theorem [1] 
established that for any signal, we achieve the oracle estimator up to a logarithmic term provided 
7 > 1. So, our primary interest is to wonder what happens, from the theoretical point of view, 
when 7 < 1? To handle this problem, we consider the simplest signal in our setting, namely 

/ = l[o,i]- 

Applying Theorem [T] with the Haar basis and 7 > 1 gives 

' ' n 

where C is a constant. The following result shows that this rate cannot be achieved for this particular 
signal when 7 < 1. 



Calibration of thresholding rules for Poisson intensity estimation 



9 



Theorem 3. Let f = l[o,i]- If 1 <^ them there exists 6 <1 not dependent of n such that 



iE(ll/^,-/lli)>^, 



where c is a constant. 



Theorem E] establishes that, asymptotically, with 7 < 1 cannot estimate a very simple signal 
(/ = l[o,i]) at a convenient rate of convergence. This provides a lower bound for the threshold 
parameter 7: we have to take 7 > 1. 

Now, let us study the upper bound for the parameter 7. For this purpose, we do not consider a 
particular signal, but we use the worst oracle ratio on the whole class !Fn{R)- Remember that when 
7 = 1 + \/2, Theorem [2] gives that this ratio cannot grow faster than 121ogn, when n goes to 00: 
for n large enough, 

iE(ll/^,-/lli) 

sup — 7-5 Y — 121ogn. 

ferr.(R) Eagf,, mm(^^, Vx,n) + ^ 

Our aim is to establish that the oracle ratio on Tn{R) for the estimator where 7 is large, is 
larger than the previous upper bound. This goal is reached in the following theorem. 

Theorem 4. Let 7min > 1 &e fixed and let 7 > 7niiiT Then, for any R> 2, 

^^(ll/?Vy ~ /II2) n,t r- , \2i , f-iw 

sup ■ '2 N I 1 > 2(^7- VTml^) log ?^ X (1 +On(l)). 

Now, if we choose 7 > (1 + \/6)2 K, 11.9, we can take 7min > 1 such that the resulting maximal 
oracle ratio of is larger than 121ogn for n large enough. So, taking 7 > 12 is a bad choice for 
estimation on the whole class !Fn{R)- 

Note that the function l[o.i] belongs to J^n(2), for all n > 2. So, combining Theorems [2l [3] and 
m proves that the convenient choice for 7 belongs to the interval [1,12]. Finally, observe that the 
rate exponent deteriorates for 7 < 1 whereas we only prove that the choice 7 > 12 leads to worse 
rates constants. 



5 Numerical study 

In this section, some simulations are provided and the performances of the thresholding rule are 
measured from the numerical point of view by comparing our estimator with other well-known 
procedures. We also discuss the ideal choice for the parameter 7 keeping in mind that the value 
7 = 1 constitutes a border for the theoretical results (see Theorems [1] and [3]) . For these purposes, 
our procedure is performed for estimating various intensity signals and the wavelet set-up associated 
with biorthogonal wavelet bases is considered. More precisely, we focus either on the Haar basis 
where 

4> = ^= 1[0,1], Ip = lp = l[0,l/2] - 1]1/2,1] 

or on a special case of spline systems given in Figure [T] The latter, called hereafter the spline 
basis, has the following properties. First, the support of (p, ip, 4> and ip is included in [—4,5]. The 
reconstruction wavelets (f) and ip belong to C^'^^^. Finally, the wavelet is a piecewise constant 
function orthogonal to polynomials of degree 4 (see [I^])- So, such a basis has properties 1-5 required 
in Section [7^ with r = 0.272. Then, the signal / to be estimated is decomposed as follows: 

AeA fcez j>o fcez 
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Figure 1: The spline basis. Top: (j) and ip, Bottom: (j) and ip 

For estimating /, we use the empirical coefficients (/3A)AeA associated with a Poisson process N 
whose intensity with respect to the Lebesgue measure is n x /. Since (j) and ^p are piecewise 
constant functions, accurate values of the empirical coefficients are available, which allows to avoid 
many computational and approximation issues that often arise in the wavelet setting. We consider 
the thresholding rule = {fn,'y)n with fn,^ defined in (|2.7p with 

Tn = {\ = {3,k): -l<j<jo,keZ} 

and 

77A,7 = \J 2jlog {n)Vx,n -\ ^;^\W\\\oo- 

Observe that r]\^y slightly differs from the threshold defined in (12. 4|) since V\^n is now replaced with 
Vx^n- It allows to derive the parameter 7 as an explicit function of the threshold which is necessary 
to draw figures without using a discretization of 7, which is crucial in Section ISTTl The performances 
of our thresholding rule associated with the threshold t/a,7 defined in (|2.4|) are probably equivalent 
(see ([QU I. 

The numerical performance of our procedure is first illustrated by performing it for estimating 
nine various signals whose definitions are given in Section[8l These functions are respectively denoted 
'Haarl', 'Haar2', 'Blocks', 'Comb', 'Gaussl', 'Gauss2', 'BetaO.5', 'Beta4' and 'Bumps' and have been 
chosen to represent the wide variety of signals arising in signal processing. Each of them satisfies 
ll/lli = 1 and can be classified according to the following criteria: the smoothness, the size of the 
support (finite/infinite), the value of the sup norm (finite/infinite) and the shape (to be piecewise 
constant or a mixture of peaks). Remember that when estimating /, our thresholding algorithm 
does not use ||/||oo, the smoothness of / and the support of / denoted supp(/) (in particular ||/||oo 
and supp(/) can be infinite). Simulations are performed with n = 1024, so we observe in average 

ll/lli = 1024 points of the underlying Poisson process. To complete the definition of = {fn,'y)n, 
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we rely on Theorems [T] and [3] and we choose jo = log2('^) = 10 and 7 = 1 (see conclusions of Section 
15.11) . Figure [2] displays intensity reconstructions we obtain for the Haar and the spline bases. 

The preliminary conclusions drawn from Figure [2] are the following. As expected, a convenient 
choice of the wavelet system improves the reconstructions. We notice that the estimate fn,i seems 
to perform well for estimating the size and the location of peaks. Finally, we emphasize that the 
support of each signal does not play any role (compare estimation of 'Comb' which has an infinite 
support and the estimation of 'Haarl' for instance). 

5.1 Calibration of our procedure from the numerical point of view 

In this section, we deal with the choice of the threshold parameter 7 in our procedures from a 
practical point of view. We already know that the interval [1, 12] is the right range for 7, theoretically 
speaking. Given n and a function /, we denote Rnil) the ratio between the ^2-performance of our 
procedure (depending on 7) and the oracle risk where the wavelet coefficients at levels j > jo are 
omitted. We have: 

EAer„min(/32,1/^ „) ^^^^p^^ min(/32, „) 

Of course, Rn is a stepwise function and the change points of i?„ correspond to the values of 7 
such that there exists A with r]x^-y = The average over 1000 simulations of Rn{l) is computed 
providing an estimation of E(i?„(7)). This average ratio, denoted Rn{l) and viewed as a function of 
7, is plotted for n G {64, 128, 256, 512, 1024, 2048, 4096} and for three signals considered previously: 
'Haarl', 'Gaussl' and 'Bumps'. For non compactly supported signals, we need to compute an infinite 
number of wavelet coefficients to determine this ratio. To overcome this problem, we omit the tails 
of the signals and we focus our attention on an interval that contains all observations. Of course, 
we ensure that this approximation is negligible with respect to the values of Rn- As previously, we 
take jo = log2("')- Figure [3] displays Rn for 'Haarl' decomposed on the Haar basis. The left side of 
Figure O gives a general idea of the shape of Rn, while the right side focuses on small values of 7. 
Similarly, Figures [H and [5] display Rn for 'Gaussl' decomposed on the spline basis and for 'Bumps' 
decomposed on the Haar and the spline bases. 
To discuss our results, we introduce 

lmin{n) = argmin^>oi?„(7). 

For 'Haarl', 7min("^) > 1 for any value of n and taking 7 < 1 deteriorates the performances of 
the estimate. The larger n, the stronger the deterioration is. Such a result was established from 
the theoretical point of view in Theorem [3l In fact. Figure [3] allows to draw the following major 
conclusion for 'Haarl': 

R^{l) -R^{lunn{n)) ^ I (5.1) 

for 7 belonging to a large interval that contains the value 7=1. For instance, when n = 4096, the 
function Rn is close to 1 for any value of the interval [1, 177]. So, we observe a kind of "plateau 
phenomenon". Finally, we conclude that our thresholding rule with 7=1 performs very well since 
it achieves the same performance as the oracle estimator. 

For 'Gaussl', 7min('^) > 0.5 for any value of n. Moreover, as soon as n is large enough, the 
oracle ratio for 7min("') is close to 1. Besides, when n > 2048, as for 'Haarl', ^m\n{n) is larger than 
1. We observe the "plateau phenomenon" as well and as for 'Haarl', the size of the plateau increases 
when n increases. This can be explained by the following important property of 'Gaussl': 'Gaussl' 
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Figure 3: The function 7 Rn{l) at two scales for 'Haarl' decomposed on the Haar basis and for 
n G {64, 128, 256, 512, 1024, 2048, 4096} with jo = log2(n). 
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Figure 4: The function 7 — > Rnil) for 'Gaussl' decomposed on the spline basis and for n G 
{64, 128, 256, 512, 1024, 2048, 4096} with jo = log2(n). 
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Figure 5: The function 7 — > Rn{l) for 'Bumps' decomposed on the Haar and the spline bases and 
for n G {64, 128, 256, 512, 1024, 2048, 4096} with jo = log2(n). 

can be well approximated by a finite combination of the atoms of the spline basis. So, we have the 
strong impression that the asymptotic result of Theorem [3] could be generalized for the spline basis. 

Conclusions for 'Bumps' are very different. Remark that this irregular signal has many significant 
wavelet coefficients at high resolution levels whatever the basis. We have 7min('^) < 0.5 for each 
value of n. Besides, 7mm('^) ~ when n < 256, which means that all the coefficients until j = jo 
have to be kept to obtain the best estimate. So, the parameter jo plays an essential role and has to 
be well calibrated to ensure that there are no non-negligible wavelet coefficients for j > jo. Other 
differences between Figure[3](or Figured)) and Figure [5] have to be emphasized. For 'Bumps', when 
n > 512, the minimum of Rn is well localized, there is no plateau anymore and i?n(l) > 2. Note 
that Rnilmmin)) is larger than 1. 

Previous preliminary conclusions show that the ideal choice for 7 and the performance of the 
thresholding rule highly depend on the decomposition of the signal on the wavelet basis. Hence, in 
the sequel, we have decided to take jo = 10 for any value of n so that the decomposition on the basis 
is not too coarse. To extend previous results. Figures [6] and [7] display the average of the function 
Rn for the signals 'Haarl', 'Haar2', 'Blocks', 'Comb', 'Gaussl', 'Gauss2', 'BetaO.5', 'Beta4' and 
'Bumps' with jo = 10. For the sake of brevity, we only consider the values n G {64, 256, 1024, 4096} 
and the average of Rn is performed over 100 simulations. Figure [6] gives the results obtained for the 
Haar basis and Figure [7] for the spline basis. This study allows to draw conclusions with respect 
to the issue of calibrating 7 from the numerical point of view. To present them, let us introduce 
two classes of functions. 

The first class is the class of signals that only have negligible coefficients at high levels of 
resolution. The wavelet basis is well adapted to the signals of this class that contains 'Haarl', 
'Haar2' and 'Comb' for the Haar basis and 'Gaussl' and 'Gauss2' for the spline basis. For such 
signals, the estimation problem is close to a parametric problem. In this case, the performance of 
the oracle estimate can be achieved at least for n large enough and (15. 1|) is true for 7 belonging to a 
large interval that contains the value 7 = 1. These numerical conclusions strengthen and generalize 
theoretical conclusions of Section [H 

The second class of functions is the class of irregular signals with significant wavelet coefficients 
at high resolution levels. For such signals 7min('^) < 0.8 and there is no "plateau" phenomenon (in 
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Figure 6: Average over 100 iterations of the function Rn for signals decomposed on the Haar basis 
and for n G {64, 256, 1024, 4096} with jo = 10. 




Figure 7: Average over 100 iterations of the function Rn for signals decomposed on the spline basis 
and for n G {64, 256, 1024, 4096} with jo = 10. 



16 



P. Reynaud-Bouret and V. Rivoirard 



particular, we do not have -Rn(l) — -Rn(7min('^)))- 

Of course, estimation is easier and performances of our procedure are better when the signal 
belongs to the first class. But in practice, it is hard to choose a wavelet system such that the 
intensity to be estimated satisfies this property. However, our study allows to use the following 
simple rule. If the practitioner has no idea of the ideal wavelet basis to use, he should perform the 
thresholding rule with 7 = 1 (or 7 slightly larger than 1) that leads to convenient results whatever 
the class the signal belongs to. 



5.2 Comparisons with classical procedures 

Now, let us compare our procedure with classical ones. We first consider the methodology based on 
the Anscombe transformation of Poisson type observations (see [1]). This preproprecessing yields 
Gaussian data with a constant noise level close to 1. Then, universal wavelet thresholding proposed 



by Donoho and Johnstone [l3|| is applied with the Haar basis. Kolaczyk corrected this standard 



algorithm for burst-like Poisson data. He proposed to use Haar wavelet thresholding directly on 



the binned data with especially calibrated thresholds (see [19| and [20||). In the sequel, these 
algorithms are respectively denoted ANSCOMBE-UNI and CORRECTED. We briefly mention that 
CORRECTED requires the knowledge of a so-called background rate that is empirically estimated 
in our paper (note however that CORRECTED heavily depends on the precise knowledge of the 
background rate as shown by the extensive study of Besbeas, de Feis and Sapatinas 01). One can 
combine the wavelet transform and translation invariance to eliminate the shift dependence of the 
Haar basis. When ANSCOMBE-UNI and CORRECTED are combined with translation invariance, 
they are respectively denoted ANSCOMBE-UNI-TI and CORRECTED-TI in the sequel. Finally, 
we consider the penalized piecewise-polynomial rule proposed by Willett and Nowak [26|| (denoted 
FREE-DEGREE in the sequel) for multiscale Poisson intensity estimation. Unlike our estimator, 
the knowledge of the support of / is essential to perform all these procedures that will be sometimes 
called "support-dependent strategies" along this section. We first consider estimation of the signal 
'Haar2' supported by [0, 1] for which reconstructions with n = 1024 are proposed in Figure [8] 
where we have taken the positive part of each estimate. For ANSCOMBE-UNI, CORRECTED 
and their counterparts based on translation invariance, the finest resolution level for thresholding 
is chosen to give good overall performances. For our random thresholding procedures, respectively 
based on the Haar and spline bases and respectively denoted RAND-THRESH-HAAR and RAND- 
THRESH-SPLINE, we still use 7 = 1 and jo = log2(ri) = 10. We note that for the setting of 
Figure m translation invariance oversmooths estimators. Furthermore, comparing (a), (b) and (c), 
we observe that universal thresholding is too conservative. Our procedure works well provided the 
Haar basis is chosen, whereas FREE-DEGREE automatically selects a piecewise constant estimator. 
Now, let us consider a non-compactly supported signal based on a mixture of two Gaussian densities. 
We denote d the distance between modes of these Gaussian densities, so the intensity associated 
with this signal is 

and we take n = 1024. To apply support-dependent strategies, we consider the interval given by 
the smallest and the largest observations and data are first rescaled to be supported by the interval 
[0,1]. Reconstructions with d = 10 and d = 70 are given in Figured RAND-THRESH-HAAR 
outperforms ANSCOMBE-UNI and CORRECTED but all these procedures are too rough. To some 
extent, it is also true for ANSCOMBE-UNI-TI and CORRECTED-TI even if translation invariance 
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Figure 8: Reconstructions of 'Haar2' with n = 1024. (a) ANSCOMBE-UNI; (b) CORRECTED; (c) 
RAND-THRESH-HAAR; (d) ANSCOMBE-UNI-TI; (e) CORRECTED-TI; (f) FREE-DEGREE; 
(g) RAND-THRESH-SPLINE. 




Figure 9: Reconstructions of fd with n = 1024 (left: d = 10, right d = 70). (a) ANSCOMBE-UNI; 
(b) CORRECTED; (c) RAND-THRESH-HAAR; (d) ANSCOMBE-UNI-TI; (e) CORRECTED-TI; 
(f) FREE-DEGREE; (g) RAND-THRESH-SPLINE. 
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Figure 10: Mean square error over 100 simulations of the different methods with n = 1024. From left 
to right: 10, 30, 50 and 70. (a): ANSCOMBE-UNI; (b): CORRECTED ; (c): RAND-THRESH- 
HAAR; (d): ANSCOMBE-UNI-TI; (e) : CORRECTED-TI; (f): FREE-DEGREE; (g): RAND- 
THRESH-SPLINE. 



improves the corresponding reconstructions. This is not the case for RAND-THRESH-SPLINE and 
FREE-DEGREE. When d = 70, performances of all the support-dependent strategies deteriorate, 
which illustrates the harmful role of the support. In particular, procedures based on the translation 
invariance principle which periodizes the data, deal with the two main parts of the signal as if 
they were close to each other, they are consequently quite inadequate. The worse performances of 
FREE-DEGREE for d = 70 could be expected since its theoretical performances are established 
under the strong assumption that the signal is bounded from below on its (known) support. To 
strengthen these results and to show the influence of the support, we compute the mean square 
error over 100 simulations for each method and we provide the corresponding boxplots given in 
Figure [TOl associated with fd when d G {10, 30, 50, 70}. Note that when d increases, unlike the other 
algorithms, performances of our thresholding rule based either on the Haar or on the spline basis 
are remarkably stable. In particular, for d = 70, RAND-THRESH-SPLINE outperforms all the 
other algorithms. Note also the very bad performances of ANSCOMBE-UNI and CORRECTED 
for d = 50 due to the inadequacy between the way the data are binned and the distance d. 

The main conclusions of this short study are the following. We note that the estimate proposed in 
this paper outperforms ANSCOMBE-UNI and CORRECTED (compare (a), (b) and (c)), showing 
that the data-driven calibrated threshold proposed in (|2.4p improves classical ones. In particular, 
classical methods highly depend on the way data are binned and on the choice of resolutions levels 
where coefficients are thresholded, whereas our methodology only depends on 7 and on j'o for which 
we propose to take systematically 7 = 1 and jo = log2(f^)- However, unlike FREE-DEGREE, we 
have to choose a convenient wavelet basis for decomposing the signals. Finally, the support, if too 
large, can play a harmful role whenever the method needs to rescale the data. This is not the 
case for the method presented in this paper, which explains the robustness of our procedures with 
respect to the support issue. 
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6 Proofs of the results 

6.1 Proof of Proposition [l] 

The first point is obvious. For the second point, first, let us take f £ J^. We can write / = 
EagAi /^a'^a, where 

Ai = {A : /?A / 0} 
is finite. Since /3a / imphes Fa > 0, we have 

min Fx > 0. 

AeAi 

So, / belongs to J^n{R) for n and R large enough. 

Conversely, if / = X^AeA /^a<^a belongs to J^n{R) for some n and some i? > and if / has an infinite 
number of non-zero wavelet coefficients, then there is an infinite number of indices A = (j, k) such 
that 

^ (log n) (log log n) 

-f^A — i^j,k ^ • 

n 

So, either for any arbitrary large j, there exists k such that 

(log n) (log log n) ■ 

< Fj^k < i/ioo|supp(99j-fc)| = |/|oo2 \ 

Tt 

so / Loo(^) or there exists j such that YlkFj,k = +oo and / Li (ii) (see (|7.5p ). This cannot 
occur since / G !Fn{R)- This concludes the proof of Proposition 1. 

6.2 Proof of Theorem [2 



We first state the following lemma established in [23|] where it is used to derive Theorem [H For the 
sake of exhaustiveness, the proof of Lemma [T] is recalled in section [731 

Lemma 1. For all k such that 7^2 < ^ < 1^ there exists a positive constant K depending on 7, k 
and II /111 such that 



W^.-/llis(|^)„;jt.< 



^ E + ^ E - e.? + E nil.) \ + ^ 

A^m Asm ASm 



where we denote by m any possible subset of indices A 

'?A,7 



First, we give an upper bound for E(r/^ ). For any 6 > 0, 



Hvl-y) < (1 + 5)27lognE(V3.,„) + (1 + 5"') ( ^ 



V^aIIoo- 



Moreover, 



So, 



II II 2 

E(VA,n) < (l + 5)FA.n + (l + ri)37logn^^. 
E(r?2 -^) < (l + 5)227logny,,„ + A(J) T^i^ V ||^,||^, (5.1) 
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with A(5) a constant depending only on 5. Now, let us choose the parameter 7 in an optimal way. 
The main terms in the upper bound given by the lemma are the first and third ones. So, we choose 
K? close to as required by the assumptions to the lemma and we fix 7 such that 

are as small as possible. We first minimize ^''^J^"^^ so we choose 7=1 + \/2. Now, we set 
K = V0A2 w (1 + v/2)-i/2_ rpj^gj^^ ^j^j^ ^ ^ Q g^^j^ ^^^^ 

(1 + Sf = 11.822(1 - k2)(27(1 + K^))-! ~ 1.00006, 

we obtain 

2 



mcr„ 



A^m Aem ASm J 

where 

A' = A(5)72(1 + k2)(^_^2)-1^ 

Let n and i? > be fixed and let / G Assume that /9a / 0. In this case, 

(log n) (log log n) 

-f'A ^ • 

n 

But 

Fa < 2- '"^^(^■'O) II /II 00 < 2-"^^^(J'°)i? 

for A = (j, fc). So 2^ < 2-"' holds for n large enough and A belongs to r„. Finally, we conclude that 
/3a 7^ implies A G r„. Now, take 

m = {A E r„ : Pl> Vx,n}- 
If m is empty, then = min(/3|, V\^n) for every A G r„. Hence 

Aer„ 

and Theorem [2] is proved. If m is not empty, with A = {j, k), 

V\,n = = • 

n n 

Hence, for all n, if A G m, then (3x^0 and 

.r 1 ^ (logra)^(log logn)||yA||^ 

and if n is large enough, 

0.1 log „ v,„. > A' (!5i^) % 3.4 f,„. 

Agm Asm ASm 

Theorem [2] is proved since for n large enough (that depends on R), we obtain: 

IE||/^^ - /II2 <e^Pl + 11.9221ogn 5] Vx,n + ^ < 121ogn ( E '^A + E ^a,„, + M • 

A0m Agm \ A^m Aem / 
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6.3 Proof of Theorem U 

Let 7 < 1. Note that for all e > 0, 

27yA,nlogn + ^IIv^aIIoo < VXn < V'x,, ■= + e)log {n)Vx,n + (6.2) 



3n 



n 



where = Ve + 6 + 1/3 depends only on e. We choose e such that 7' = 7(1 + e) < 1. Let a > 1 
and n be fixed. We set j the positive integer such that 



(log n) 

For all k G {0, 2^ - 1}, we define 



" < 2^- < 



(log n) 



= / dN and iV", = 



(fc+l)2- 



These variables are i.i.d. random Poisson variables of parameter finj = ^2 ^ ^. Moreover, 

23 



2i 



Pj,k = -{N^k - Ni,k) and = ^(iV+, + iV^. 

Hence, 

fe=0 



A;=0 



A;=0 



|Af+fc-iV-J> j27'log („)(Ar+ +JV- )+log (n)7t«, 



Let Un be a bounded sequence that will be fixed later such that Un > ^w^. We set 

2 



47'log (n)/i„j + log (n)ii„ 



where jln.j is the largest integer smaller that ^n,j- Note that if 



then 



= An,, + 



and iVj- = /in,. 



2 ' 



27'log (n)(7V+, + AT-J + log (n)tx„. 



Let and be two independent Poisson variables of parameter /in,j - Then, 

E(||/^7 - /Hi) > ^^^n,,P f iV+ = /in,i + and iV- = " 
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Note that 



and 



So, we set 



lim — — — = lim — — = 0. 



and rUnJ = /^n, 



that go to +00 with n. Now, we take a bounded sequence Un such that for any n, ^ ^"'^ is an integer 
and Un > 7U)£. Hence by the Stirhng formula, 



(loi 



> 



> 



> 



(logn)2«/„j! m„j! 



47'e / /i 



(logn)2"-i ^ / 



) (l + On(l)) 



27r Y^Z„ jTTi^j- 



27'e ^ -Mnj 

-e 



2m„ 



2m„ 



7r(logn)2"-i 

where = (1 + x)log (1 + x) — x = x^/2 + O(x^). So, 



(l+On(l)) 



E(||/^,-/||i)> 



-e 



3 



Since 



we obtain 



7r(logn)2""i 
Vn,j = 47'log (n)/i„j(l + o„,(l)). 



" 7r(lognj^" ^ 



IE(||/^,-/lli)>^(l+On(l)), 



Finally, for every 5 > 7', 

and Theorem [3] is proved. 

6.4 Proof of Theorems] 

Without loss of generality, the result is proved for R = 2. Before proving Theorem HJ let us state 
the following result. 
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Lemma 2. Let 7min £ (1)7) fixed and let ??A,7niin threshold associated with 7n 



where 



II 112 II II 2 

1 ' V ' j^z ^z 



(see {2-4^ ). Let u = {un)n be a sequence of positive numbers and 

A„ = {A G r„ : F{r^xa < \Px\ + Vx,j^J < Un} ■ 

Then 

ml^n - /Hi) > f E /^A ) (1 - (3n-^-" + n„)). 
\AeA„ / 

Proof. 

mfj, - /ll^) > E ((/3a - /3a)^1|^,|>,,^ + /3!l|^,|<,,,^ 
AeA„ 

> ^ /?M|/3a| < r?A,7) 
aga„ 



PlF{\Px - (3x\ + |/3a| < rjx,j) 

aga„ 

^ /?|P(|/3a - Px\ < ??A,7,„i„ and rjx,^^,^ + |/3a| < ??A,7) 



> 

agAi, 

> 

agAu 

^ E ^A (l - {mx - /?a| > r?A,7^,J + P(??A,7„,„ + |/?a| > r?A, 

aga„ 



> E (l-(3n-^-"+^.)), 



, aga« 



by applying the technical Lemma [3] of the Appendix section. 
Using Lemma O we give the proof of Theorem [H Let us consider 



/ = l[o,i] + 2^ V 1 V'i-fc' 

feGAO 

with 

AA,= {0,1,..., 2^-1} 

and 

n , 2n 

< 2^ < — — , a > 0. 



(logn)i+" ~ (logn)i+" 
Note that for any k £ Mj, 

(log n)(log log n 



n 
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for n large enough and / belongs to J^n(2). Furthermore, for any k £ Mj, 

V{j,k),n = V(-lfl),n = -■ 

So, for n large enough, 

mm{PlVx,n) = ^(-l,0),n + Yl ^iJ<k),n = ^ + 5^ ^• 

Now, to apply Lemma O let us set for any n, Un = n~'^ and observe that for any e > 0, since 

Tmin < 7i 



P(^A,7„.i„ + \Px\ > VX,^) < + e)27mmlogny,™'^ + (1 + e^^)/?^ > 27log?lVA,n) 

Pi 



with 

,2 _ 2(^/7 - V7mm)^log n 



n 



With e = v/7/7min - 1 and 9 = v^7mm/7, 

P((l + e)27^i„lognVX;i'^ + (1 + e-^)pl > 27lognyA,n) = IP(^n"n° + (1 " 0)Vx,n > Vx,n) 
Since < ^A,n, 

So, 

{{j,k): k£Mj}cAu, 

and 



> (V7 - V7^)'21ogn 5^ -(1 - (3n-^-'" + n"^)) 



, n 



\xern ^ ) 

Finally, since card(A/'j) — > +cxd when n +oo, 

IE(ll/n . -/ii) > _ ^)22iog n(l + o„(l)). 

EA6r„min(/3^,l^A,n) + ;i 

7 Appendix: Technical tools 

7.1 Some probabilistic properties of the Poisson process 

Let us first recall some basic facts about Poisson processes. 

Definition 1. Let {X,X) he a measurable space. Let N he a random countable subset of X. N is 
said to be a Poisson process on {X, X) if 
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1. for any A G X, the number of points of N lying in A is a random variable, denoted Na, which 
obeys a Poisson distribution with parameter ^x{A), where /x is a measure on X . 

2. for any finite family of disjoint sets Ai,...,An of X , Na^, Na„ are independent random 
variables. 

We focus here on the case X = M. Let us mention that a Poisson process N is infinitely 
divisible, which means that it can be written as follows: for any positive integer k: 

k 

dN = Y^ dNi (7.1) 

i=l 

where the NiS are mutually independent Poisson processes on M with mean measure ^i/k. The 



following proposition (sometimes attributed to Campbell (see [l8|)) is fundamental. 



Proposition 2. For any measurable function g and any z G R, such that J e^^^^^dfix < oo one has, 



E 

So, 



exp (^z ^ g{x)dN,^ = exp (^^ (e^^C-) _ dfi,)j 



E / g{x)dNA = / g{x)dfix, Var / g{x)dNA = / g^ix)dfx, 
\Jr j Jr \Jr j Jr 

If g is bounded, this implies the following exponential inequality. For any u > 0, 



F I / gix){dNx - dfi.^) > \l2u I g^{x)dfix + ^i^lloon j < exp(-n). (7.2) 



7.2 Biorthogonal wavelet bases 

We set 



l[o,i]- 



For any r > 0, there exist three functions ■0, and ip with the following properties: 

1. (j) and are compactly supported, 

2. (j) and -0 belong to C^~^^, where C^~^^ denotes the Holder space of order r + 1, 

3. is compactly supported and is a piecewise constant function, 

4. is orthogonal to polynomials of degree no larger than r, 

5. {{(j)k,tpj,k)j>o,keZ, (0fc,0i,fc)j>o,fcGz} is a biorthogonal family: for any j,j' > 0, for any k,k', 

/ ipj^k{x)(i>k'{x)dx = / (f)k{x)tjjj',k'{x)dx = 0, 
Jr Jr 

(t>k{x)4)k'{x)dx = lk=k', / ipj,kix)ijjj',k'{x)dx = lj=f^k=k', 

Jr 

where for any 2; G M and for any {j, k) £ , 

0fc(x) = (t){x - k), i/jj^kix) = 22 V(2^x - k) 

and 

4>k{x) = 4>{x - k), 4fc(x) = 2H{2^x-k). 
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This implies the wavelet decompositio n (12.11) of /. Such biorthogonal wavelet bases have been built 
by Cohen Daubechies and Feauveau HI] as a special case of spline systems (see also the elegant 
equivalent construction of Donoho from boxcar functions). The Haar basis can be viewed as a 
particular biorthogonal wavelet basis, by setting </> = </> and = = l[o i] ~ l]i i] : with r = even 
if Property 2 is not satisfied with such a choice. The Haar basis is an orthonormal basis but this is 
not true for general biorthogonal wavelet bases. However, we have the frame property: if we denote 

^ = {(|),^p,^,^p} 

there exist two constants ci(^) and C2(^) only depending on <P such that 
Vfcez j>ofcez / \fcez j>o , 



For instance, when the Haar basis is considered, ci{<P) = C2(^) = 1. In particular, we have 

CimWP - PWl < ll/n,7 - /II2 < C2mP - PWl- (7.3) 

An important feature of such bases is the following: there exists a constant fi^ > such that 

inf |</.(x)| >1, inf |V'(x)| >;.^, (7.4) 

a;e[0,l] xGSUppCi/i) 

where supp(?/') = {x G M : ip{x) 7^ 0}. 
7.3 Proof of Lemma [1] 



The proof of Lemma [T] is based on the following result proved in |23l | 



Theorem 5. To estimate a countable family (3 = {P\)xeA, such that WPWi^ < 00, we assume 
that a family of coefficient estimators {Px)x^y, where T is a known deterministic subset of A, and 
a family of possibly random thresholds {r]x)\i^r are available. We consider the thresholding rule 
(3 = (/9Al|^^|>^^lAer)AeA- Let e > be fixed. Assume that there exist a deterministic family 
{Fx)xeT and three constants k £ u £ [0, 1] and fi > (that may depend on e but not on X) 

with the following properties. 

(Al) For all X in T, 

n\k - /3a| > «r/A) < oj. 
(A2) There exist 1 < p, q < 00 with ^ + | = 1 and a constant R > such that for all X in T, 



E{\(3x-Pxn) <iimax(FA,F/e^). 
(A3) There exists a constant 9 such that for all X inT such that Fx < Be 
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Then the estimator (3 satisfies 

y A^m Agm ASm J AgT 

with 

R 



To prove Lemma [U we apply Theorem [5] with j3\ defined in (|2.3p . rj\ = rjx^ defined in (12. 4p and 
r = r„ defined in (ED). We set 



Fa = / f{x)dx, 

JSUpp(¥>A) 



SO we have: 



E^^= E E/ f{x)dx< f f{x)dx El-esupp(^,,) < (jo + 2)m^||/||i, 

Aer„ -l<i<io k •^^GSUpp(v:.,,fc) J -l<j<jo k 

(7.5) 

where m^p is a finite constant depending only on the compactly supported functions 4> and ^. Finally, 
Fx is bounded by log(n) up to a constant that only depends on ||/||i and the functions 4) 
and Now, we give a fundamental lemma to derive Assumption (Al) of Theorem [5l 

Lemma 3. For any li > 0, 

P {\k - /?a| > ^2^+ Mes!^ j < 2e-^ (7.6) 

Moreover, for any n > 0, 

IP [Vx,n > VA,n(n)) < e-\ 

where 

I ii ifz II II 2 

V 

Proof. Equation (|7.6|) comes easily from (|7.2p applied with g = ipx/n. The same inequality applied 
with g = —ip\/v? gives: 



Vx,n > Vx,n + y 2u ^ ^E2^nf{x)dx + < e-^ 



We observe that 

'^a(^) f( ^ > ^ IIv'aI 



-nf{x)dx < ^^^A,n. 



So, if we set a = u ^'^^\"' , then 



P(14,„ - \/2TV^ - a/3 > Vx,n) < e-". 

We obtain 
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where V ^(V^.n) is the positive solution of 

{V-\Vx,n)? - V2^V~\Vx,n) - {a/3 + Vx,n) = 0. 
To conclude, it remains to observe that 



Vx,n + 5a/6 + ^/a/2] . 



Let K < 1. Combining these inequalities with Vx^n = ^A,n(7logn) yields 

K7logn||v9A||oo 

3n 



P(|/3a - Px\ > t^VXn) < P I/3a - Px\ > ^/2K^^lognVx,n + 



< P ( I/5a - f3x\ > \/2K^jlognVx,n + , Vx,n > Vx,n 

+1P ( \(3x - (3x\ > \/2K^^lognVx,n + ^^^"g,^^^'^^"- , Vx,n < Vx, 



< nVx,n > Vx,n) + P (|/3a - Px\ > 2 K^jlog nVx,n + ^^^"g^^^'^^'l 

< n^^ + 2n-^'^ 



So, for any value of k G [0, 1[, Assumption (Al) is true with r]x = Vx.-y F = r„ if we take 
uj = 3n~^ To satisfy the Rosenthal type inequality (A2) of Theorem [5l we prove the following 
lemma. 

Lemma 4. For any p> 1, there exists an absolute constant C such that 

2p-2 \ 



1E(|/3a-/?aP^)<C7V^(^^£„ + 



\\V>x\\ 



n 



Vx,n 



Proof. We apply (17.11) . Hence, 



Px-Px = Y.[ ^ {dK - nk-'f{x)dx) =Y,Y, 

i=l i=l 



where for any i, 



Yi 



So the YiS are i.i.d. centered variables, each of them having a moment of order 2p. For any i, we 
apply the Rosenthal inequality (see Theorem 2.5 of jl^) to the positive and negative parts of Y^. 
This easily implies that 



E 



i=l 



2p\ 



16p 

< ; — r max 

,log(2p); 



2p 



4 = 1 



1 = 1 
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It remains to bound the upper limit of '^{Yli=i for ^ ^ 2} > 2 when k — > oo. Let us 
introduce 

flk = {caxd{N^) < 1 for any i e {1,... , k}}. 
Then, it is easy to see that P(O^) < k~\n\\f\\i)^ (see e.g., (1711)]) below). 



OnQk, \Yif = Ok{k^^) if card(A^i) = and \Yi\ 



+0k k 



,-1 



|yA(T)| 
n 



if r ^dNi 

J n x 



where T is the point of the process N^. Consequently, 



i=l 



\T£N 



\MT)\ 



n 



+ Ok k-' 



\MT)\ 



n 



+ kOk{k~ 



\ 



E 



El*-.! 



(7.7) 



But we have 



1=1 



W\\\ 



< 2' 



i=l 

V\ loo 

n 



n 



{N^Y+(k-^ / \^^{x)\f{x)dx 



{N^Y + kik-^ / |(^A(x)|/(x)dx 



So, when k +00, the last term in (|7.7p converges to since a Poisson variable has moments of 
every order and 



limsup,._^E^|yi|^ <E 



i=l 



n 



n 



which concludes the proof. 
Now, 



n 



^l{x)f{x)dx < 



and Assumption (A2) is satisfied with e = ^ and 

2Cp22Jo max 



R 



n 



(7. 



n 



since ||95a||oo ^ ^-"^ max 



^HIV'II^) and 



n 



n 



Finally, Assumption (A3) comes from the following lemma. 
Lemma 5. We set 



iVx 



/ dN and C" = (^6 + 1/3)7 > ^6 + 1/3. 
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There exists an absolute constant < 0' < 1 such that if 

nFx < e'C'logn 

and 

{l-e'){VG + l/3)logn>2 (7.9) 

then, 

¥{Nx - nFx > (1 - 9')C'logn) < Fxn-'^. 
Remark 1. We can take 6' = 0.01 and in this case, i7.9\) is satisfied as soon as n > 3. 
Proof. One takes 0' G [0, 1] (for instance 9' = 0.01) such that 



We use Equation (5.2) of [2j| to obtain 

, , / ((1 - 6'')C"logn)2 \ 3(i-fl')2 , 

- „F. > (1 - «')Clog„) < exp (- ,(„^V(ii,,)cVgn/3) ) ^ " ' 

If nFx > since ^^Qrp^C > 27 + 2, the result is true. If nFx < n~'>'~^, 

P(iVA-nFA > (l-9')C'logn) < F(Nx > (l-9')C'logn) < ¥(Nx > 2) < V ^""f^f^ e""-^^ < (nFA)^ 

-'-^ /c! 

fc>2 

(7.10) 

and the result is true. ■ 
Now, observe that if |/3a| > r/A,^ then 

AfA>C"logn. 

Indeed, \(3x\ > Vx,-y implies 

Iv^aIoo < \(3x\ < ■ 

n n 

So if n satisfies (1 — 9'){^/6 + l/3)log?i > 2, we set 9 = 9'C'log{n) and /i = n~^. In this case, 
Assumption (A3) is fulfilled since if nFx < 9'C'log n 

n0x - Px\ > ^Vx, 0x\ > Vx) < n^x - nFx > (1 - ^O^^'logn) < Fxn-\ 
Finally, if re satisfies (1 — 9'){\/6 + l/3)logre > 2, Theorem [5] gives: 

- P\l < ( 1^ E + ^ E - Px? + E ] + ldy^fx. 

y X^m. X£m X£m J AeT 

In addition, there exists a constant Ki depending on p, 7, ||/||i and on $ such that 

LL> 2_, < A'l log(n)n (7.11) 
Aer 

2 

Since 7 > 1, for all k < 1, there exists q > 1 such that 1 < and as required by Theorem [U the 
last term satisfies 

^-^ n 

Aer 

where i^(7, k, ||/||i) denotes a positive constant. This concludes the proofs. 
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The following table gives the definition of the signals used in Section [5l 
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Haarl 



Ho,i] 



Haar2 



1-5 1[0, 0.125] +0.5 1[0. 125, 0.25] +1[0.25,1] 



Blocks 



2 + y ^ (1 + sgn(a; - pA) , 

J I 3-551 



L[0,1] 



Comb 



+ 00 



^■^[fcV32,(fc2 + fe)/32] 



Gauss 1 



1 /-(x- 0.5)2 

: Cxp ' 



Gauss2 



0.25V27r 



2 X 0.252 



]_ / -(x- 0.5)2 
2^^'''' V 2 X 0.252 



3_ / -(a:-5)2 
2^'^''^ V 2 X 0.252 



BetaO.5 



0.5x-O-^l]0,i] 



Beta4 



Bumps 



3x 



where 



p = 


[ 0.1 


0.13 


0.15 


0.23 


0.25 


0.4 


0.44 


0.65 


0.76 


0.78 


0.81 


h = 


[ 4 


-5 


3 


-4 


5 


-4.2 


2.1 


4.3 


-3.1 


2.1 


-4.2 


g = 


[ 4 


5 


3 


4 


5 


4.2 


2.1 


4.3 


3.1 


5.1 


4.2 


w = 


[ 0.005 


0.005 


0.006 


0.01 


0.01 


0.03 


0.01 


0.01 


0.005 


0.008 


0.005 
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